[State Sync] Add server side implementation of semi-intelligent syncing mode. #5742

JoshLind · 2022-12-01T01:22:28Z

Note: most of this PR is just boilerplate and tests. There's probably only 30 lines of interesting code.

Description

This PR adds the server side implementation of a new "semi-intelligent" state syncing mode, where the client (syncing node) requests chunks of transactions or outputs from a peer, and the peer decides which data to return based on whichever is quicker: (i) downloading and applying the outputs; or (ii) downloading and executing the transactions. The trade-off is that outputs are quick to apply (but might require lots of network data), and transactions are slow to execute (but might require much less network data).

The way the storage server on the peer decides which data to send per chunk is through a simple value provided by the client on the request (max_num_output_reductions), which tells the server that if it needs to reduce the number of outputs being requested more than max_num_output_reductions times, it should just return transactions. For example: a client requests 2000 outputs in a single chunk and specifies max_num_output_reductions = 2. The server will attempt to serve this data, and if it needs to reduce the outputs (halve them) more than 2 times (i.e., send less than 500 outputs) it will instead fallback to sending transactions for the chunk.

Some notes:

max_num_output_reductions is configurable so that we can tune it. The current default is 2.
This change is backward compatible because client requests and responses are simple enums, and we append two new types to each, so serialization should be backward compatible. Will also confirm (manually) that serialization can handle this.

Test Plan

New and existing tests!

JoshLind · 2022-12-01T01:28:32Z

state-sync/storage-service/server/src/lib.rs

@@ -1244,6 +1321,63 @@ impl StorageReaderInterface for StorageReader {
        )))
    }

+    fn get_transactions_or_outputs_with_proof(


FYI: this is the new logic. Everything else is boilerplate to handle the new request types 😄

bchocho

LGTM with a couple questions

bchocho · 2022-12-06T02:59:06Z

state-sync/storage-service/server/src/lib.rs

+        // doesn't fit, return a transaction chunk instead.
+        let mut num_output_reductions = 0;
+        while num_output_reductions <= max_num_output_reductions {
+            let output_list_with_proof = self


Is there any concern calling this multiple times will be expensive?

Yeah, we're currently doing this for all storage calls where the data overflows the max frame size. At worst, I've seen it take a couple seconds (<= 2). There is a task on the storage side to provide better APIs so we don't have to call in like this. Once those land, we can migrate.

bchocho · 2022-12-06T03:03:45Z

state-sync/storage-service/server/src/lib.rs

+            } else if num_outputs_to_fetch == 1 {
+                break; // We cannot return less than a single item. Fallback to transactions
+            } else {
+                increment_network_frame_overflow(


This will mean there can be more overflows than responses (because multiple overflows per response). Is this the intention?

Yeah, this is fine. We have other metrics to track the number (and types) of requests and responses, so using this we can calculate the average number of overflows/retries per request. Does that make sense?

github-actions · 2022-12-06T14:09:30Z

✅ Forge suite `land_blocking` success on `c72d40ea44325482010ad348bd9b116cbf8d0b0c`

performance benchmark with full nodes : 6929 TPS, 5715 ms latency, 7800 ms p99 latency,(!) expired 600 out of 2959440 txns
Test Ok

Grafana dashboard
Humio Logs
Test runner output
Test run is land-blocking

github-actions · 2022-12-06T14:10:08Z

✅ Forge suite `compat` success on `testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b` ==> `c72d40ea44325482010ad348bd9b116cbf8d0b0c`

Compatibility test results for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> c72d40ea44325482010ad348bd9b116cbf8d0b0c (PR)
1. Check liveness of validators at old version: testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b
compatibility::simple-validator-upgrade::liveness-check : 7224 TPS, 5283 ms latency, 8400 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: c72d40ea44325482010ad348bd9b116cbf8d0b0c
compatibility::simple-validator-upgrade::single-validator-upgrade : 4484 TPS, 9095 ms latency, 12400 ms p99 latency,no expired txns
3. Upgrading rest of first batch to new version: c72d40ea44325482010ad348bd9b116cbf8d0b0c
compatibility::simple-validator-upgrade::half-validator-upgrade : 4706 TPS, 8626 ms latency, 10300 ms p99 latency,no expired txns
4. upgrading second batch to new version: c72d40ea44325482010ad348bd9b116cbf8d0b0c
compatibility::simple-validator-upgrade::rest-validator-upgrade : 6970 TPS, 5520 ms latency, 9800 ms p99 latency,no expired txns
5. check swarm health
Compatibility test for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> c72d40ea44325482010ad348bd9b116cbf8d0b0c passed
Test Ok

Grafana dashboard
Humio Logs
Test runner output
Test run is land-blocking

JoshLind requested review from zekun000, msmouse, grao1991 and lightmark December 1, 2022 01:23

[State Sync] Add server side implementation of intelligent syncing mode.

c72d40e

JoshLind force-pushed the ss_new_mode_3 branch from 5c39b7c to c72d40e Compare December 1, 2022 01:26

JoshLind commented Dec 1, 2022

View reviewed changes

perryjrandall approved these changes Dec 5, 2022

View reviewed changes

bchocho approved these changes Dec 6, 2022

View reviewed changes

JoshLind enabled auto-merge (rebase) December 6, 2022 13:26

This comment has been minimized.

Sign in to view

JoshLind merged commit 76e743c into main Dec 6, 2022

JoshLind deleted the ss_new_mode_3 branch December 6, 2022 14:10

JoshLind mentioned this pull request Dec 12, 2022

[State sync] Add client-side of new state syncing mode. #5866

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[State Sync] Add server side implementation of semi-intelligent syncing mode. #5742

[State Sync] Add server side implementation of semi-intelligent syncing mode. #5742

JoshLind commented Dec 1, 2022

JoshLind Dec 1, 2022

bchocho left a comment

bchocho Dec 6, 2022

JoshLind Dec 6, 2022 •

edited

Loading

bchocho Dec 6, 2022

JoshLind Dec 6, 2022 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Dec 6, 2022

github-actions bot commented Dec 6, 2022

[State Sync] Add server side implementation of semi-intelligent syncing mode. #5742

[State Sync] Add server side implementation of semi-intelligent syncing mode. #5742

Conversation

JoshLind commented Dec 1, 2022

Description

Test Plan

JoshLind Dec 1, 2022

Choose a reason for hiding this comment

bchocho left a comment

Choose a reason for hiding this comment

bchocho Dec 6, 2022

Choose a reason for hiding this comment

JoshLind Dec 6, 2022 • edited Loading

Choose a reason for hiding this comment

bchocho Dec 6, 2022

Choose a reason for hiding this comment

JoshLind Dec 6, 2022 • edited Loading

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Dec 6, 2022

✅ Forge suite land_blocking success on c72d40ea44325482010ad348bd9b116cbf8d0b0c

github-actions bot commented Dec 6, 2022

✅ Forge suite compat success on testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> c72d40ea44325482010ad348bd9b116cbf8d0b0c

JoshLind Dec 6, 2022 •

edited

Loading

JoshLind Dec 6, 2022 •

edited

Loading

✅ Forge suite `land_blocking` success on `c72d40ea44325482010ad348bd9b116cbf8d0b0c`

✅ Forge suite `compat` success on `testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b` ==> `c72d40ea44325482010ad348bd9b116cbf8d0b0c`